Double the trouble: handling noise and reverberation in far-field automatic speech recognition
نویسندگان
چکیده
Far-field microphone speech signals cause high error rates for automatic speech recognition systems, due to room reverberation and lower signal-to-noise ratios. We have observed large increases in speech recognition word error rates when using a far-field (3-6 feet) microphone in a conference room, in comparison with recordings from close-talking microphones. In an earlier paper, we showed improvements in far-field speech recognition performance using a longterm log spectral subtraction method to combat reverberation. This method is based on a principle similar to cepstral mean subtraction but uses a much longer analysis window (e.g., 1 s) in order to deal with reverberation. Here we show that a combination of short-term noise filtering and longterm log spectral subtraction can further reduce recognition word error rates.
منابع مشابه
Double the Trouble: Handling Noise and Automatic Speech Rec
Far-field microphone speech signals cause high error rates for automatic speech recognition systems, due to room reverberation and lower signal-to-noise ratios. We have observed large increases in speech recognition word error rates when using a far-field (3-6 feet) microphone in a conference room, in comparison with recordings from close-talking microphones. In an earlier paper, we showed impr...
متن کاملReverberation Suppression Based on Sparse Linear Prediction in Noisy Environments
We present a single channel method for late reverberation suppression. The proposed approach estimates late reverberation as a linear combination of previous time-frequency frames. We impose a sparsity constraint on the predictor in order to select the most relevant signal frames for the estimation. The dataset used for the evaluation is corrupted by background noise, thus we propose to jointly...
متن کاملTechniques for handling convolutional distortion with 'missing data' automatic speech recognition
In this study we describe two techniques for handling convolutional distortion with ‘missing data’ speech recognition using spectral features. The missing data approach to automatic speech recognition (ASR) is motivated by a model of human speech perception, and involves the modification of a hidden Markov model (HMM) classifier to deal with missing or unreliable features. Although the missing ...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملFeature mapping using far-field microphones for distant speech recognition
Acoustic modeling based on deep architectures has recently gained remarkable success, with substantial improvement of speech recognition accuracy in several automatic speech recognition (ASR) tasks. For distant speech recognition, the multi-channel deep neural network based approaches rely on the powerful modeling capability of deep neural network (DNN) to learn suitable representation of dista...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002